-
Notifications
You must be signed in to change notification settings - Fork 695
storage/ingest: Properly track ownership of underlying mem.Buffer #13888
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
pkg/frontend/v2/frontend_test.go
Outdated
| msg, err := resp.Next(ctx) | ||
| require.NoError(t, err) | ||
| msg.FreeBuffer() // We don't care about the contents of the buffer in the assertion below. | ||
| defer msg.Buffer().Free() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes in this file aren't strictly needed (but they are if we incorporate #13609), but again, it's about upholding the contract: msg's buffer should no longer be referenced once msg.FreeBuffer is called, but the assertion below references it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't quite understand these additions to this test, but I just wanted to make sure your intention matches the obscure way defer works with chained method calls. (That Buffer() is called NOW, and Free() is called when the scope exits.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does, but let's make that clear: 33a1570
b418c41 to
aef907b
Compare
seizethedave
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good, just a couple of questions.
pkg/mimirpb/testutil/buffer.go
Outdated
| } | ||
|
|
||
| func (b *memBufferWithInstrumentedRefCount) Free() { | ||
| b.refCount.Sub(1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think about panicking if this ever goes negative?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pkg/frontend/v2/frontend_test.go
Outdated
| msg, err := resp.Next(ctx) | ||
| require.NoError(t, err) | ||
| msg.FreeBuffer() // We don't care about the contents of the buffer in the assertion below. | ||
| defer msg.Buffer().Free() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't quite understand these additions to this test, but I just wanted to make sure your intention matches the obscure way defer works with chained method calls. (That Buffer() is called NOW, and Free() is called when the scope exits.)
| buf = mem.SliceBuffer([]byte("fake data")) | ||
| } | ||
| ibuf := &memBufferWithInstrumentedRefCount{Buffer: buf} | ||
| ibuf.refCount.Add(1) // Match the refCount of buf.Buffer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"// Match the refCount of buf.Buffer" -- is this necessarily true? If this is test support code, maybe we need to assert it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately we can't, there's no method on mem.Buffer to check the current ref count.
#### What this PR does Adds a `common.instrument-reference-leaks-pct` flag. When set to >0, a percentage of gRPC BufferHolder objects (most prominently, WriteRequest) will be instrumented so that, once the buffer's reference count reaches zero, if there are remaining references to the buffer, a panic is thrown. For this to work, either the object needs to come from a generated gRPC server using our custom global codec, or the newly introduced `mimirpb.Unmarshal` must be used. Since ingester storage doesn't receive WriteRequests from gRPC anymore but from Kafka, `storage/ingest` now calls `mimirpb.Unmarshal`. To-do: plumb this mechanism with the distributor too, which has its own unmarshaling path. #### Which issue(s) this PR fixes or relates to Follow up of #13573 Builds on top of #13888 #### Checklist - [x] Tests updated. - [ ] Documentation added. - [x] `CHANGELOG.md` updated - the order of entries should be `[CHANGE]`, `[FEATURE]`, `[ENHANCEMENT]`, `[BUGFIX]`. If changelog entry is not needed, please add the `changelog-not-needed` label to the PR. - [x] [`about-versioning.md`](https://github.com/grafana/mimir/blob/main/docs/sources/mimir/configure/about-versioning.md) updated with experimental features. <!-- CURSOR_SUMMARY --> --- > [!NOTE] > Introduces experimental instrumentation to detect leaked references to pooled gRPC buffers and integrates it across config, codec, and ingest paths. > > - New global gRPC codec with leak-tracking and `mimirpb.Unmarshal()`; `storage/ingest` now uses `mimirpb.Unmarshal`, and `TSDBBuilder.PushToStorageAndReleaseRequest` defers `req.FreeBuffer()` > - Adds experimental `common.instrument-reference-leaks.*` config (percentage, before_reuse_period, max_inflight_instrumented_bytes) with validation, inheritance, defaults, help, and docs > - Wires config into server startup; updates config inheritance machinery; adds focused tests for leak detection, RW2 ingest, and config inheritance; updates CHANGELOG > > <sup>Written by [Cursor Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit 79c2ea0. This will update automatically on new commits. Configure [here](https://cursor.com/dashboard?tab=bugbot).</sup> <!-- /CURSOR_SUMMARY --> --------- Co-authored-by: Taylor C <[email protected]>
What this PR does
storage/ingest#parallelStorageShardscurrently shrugs off lifetime management of the WriteRequest's underlying buffer. Although this doesn't really uphold the contract of mimirpb.WriteRequest, at the moment this doesn't matter in practice, because we don't pool those buffers in ingest storage, and thus we can leave the GC to figure out lifetimes.However, this is a blocker for #13609, in which we introduce manual memory management for a fraction of WriteRequests. We need the WriteRequest's mem.Buffer's reference counts to eventually reach zero, so we release back their memory.
This would also become a problem if we introduce pooling at some point.
After this change,
parallelStorageShardsproperly transfers ownership of those buffers from the original WriteRequests downstream, by increasing and decreasing each buffer's reference count accordingly.This is a bit involved, as each shard constructs "synthetic" WriteRequests that incorporates references to multiple buffers, and also, each buffer can be referenced by multiple of those synthetic WriteRequests. To track this shared ownership,
WriteRequest.sourceBufferHoldershas been added.Which issue(s) this PR fixes or relates to
Spun off from #13609
Checklist
CHANGELOG.mdupdated - the order of entries should be[CHANGE],[FEATURE],[ENHANCEMENT],[BUGFIX]. If changelog entry is not needed, please add thechangelog-not-neededlabel to the PR.about-versioning.mdupdated with experimental features.Note
Strengthens memory management for protobuf buffers in ingest and tests to prevent leaks and use-after-free.
WriteRequest.sourceBufferHolders,AddSourceBufferHolder(), andFreeBuffer()to retain and release strong refs to sourcemem.Buffers; exposeBufferHolder.Buffer().PushToStorageAndReleaseRequest, deferrequest.FreeBuffer()and pass&request.BufferHolderinto batching.batchingQueue.AddToBatch()/AddMetadataToBatch()now accept a*BufferHolderand callAddSourceBufferHolder()so synthetic batches keep source buffers alive.msg.Buffer(),defer buf.Free(), andmsg.SetBuffer(nil). Newmimirpb/testutilinstruments buffer refcounts; pusher tests compare marshaled requests, ensure no buffer leaks, and use new AddToBatch signatures. Reflect-based tests include the newsourceBufferHoldersfield.Written by Cursor Bugbot for commit 8296fec. This will update automatically on new commits. Configure here.